-
Notifications
You must be signed in to change notification settings - Fork 526
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[voq/systemlag] Voq system lag functionality #1605
[voq/systemlag] Voq system lag functionality #1605
Conversation
@ysmanman (Arista) - could you please review this PR, thanks. |
@@ -1362,9 +1363,13 @@ bool IntfsOrch::isRemoteSystemPortIntf(string alias) | |||
Port port; | |||
if(gPortsOrch->getPort(alias, port)) | |||
{ | |||
|
|||
if (port.m_type == Port::LAG) | |||
return(port.m_system_lag_info.switch_id != gVoqMySwitchId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we put the statement in {} to be consistent with the rest of the change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
o.k. I'll fix this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
orchagent/intfsorch.cpp
Outdated
if (port.m_type == Port::LAG) | ||
{ | ||
if (port.m_system_lag_info.switch_id != gVoqMySwitchId) | ||
{ | ||
return; | ||
} | ||
alias = port.m_system_lag_info.alias; | ||
} | ||
else | ||
{ | ||
return; | ||
if(port.m_system_port_info.type == SAI_SYSTEM_PORT_TYPE_REMOTE) | ||
{ | ||
return; | ||
} | ||
alias = port.m_system_port_info.alias; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This piece of code is duplicated in another function like voqSyncDelIntf too . Can we wrap this into a helper method? This can avoid duplicating code and also make it easy to extend it in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is done intentionally. This repeating will be there in other places also. The intention of this logic is to get whether the port or the lag is remote or not. Since we need system alias which in the Port structure,, we get the Port structure first. Since the port structure is already available and it also has the type info, I check the type in the port structure itself instead of calling another API.
//Already created, syncd local lag from CHASSIS_APP_DB. Skip | ||
it = consumer.m_toSync.erase(it); | ||
continue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it always safe to simply skip here? Is it possible this is to retry a lag that was failing to add?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. We should always skip this. This lag is the Portchannel locally created already and synced to CHASSIS_APP_DB. Since this api, which is used for both local lags and remote lags, this is invoked for entries added to CHASSIS_APP_DB also. Here we identify the local entry by the switch id. If it is local entry, since this local is already programmed, we should always skip this and erase it. So there will not be re-try at all.
@@ -3909,7 +4051,7 @@ bool PortsOrch::addLagMember(Port &lag, Port &port, bool enableForwarding) | |||
attr.value.oid = port.m_port_id; | |||
attrs.push_back(attr); | |||
|
|||
if (!enableForwarding) | |||
if (!enableForwarding && port.m_type != Port::SYSTEM) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean we cannot disable forwarding for sys lag?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Setting (enable or disable) the ingress and egress is not applicable for remote LAG members. SAI returns errors when we set these attributes for system port lag members These are done at the owner ASIC (as local physical port members)
8d9effc
to
9e3b446
Compare
3c8e4b0
to
ebdba8c
Compare
/AzurePipelines run |
Azure Pipelines successfully started running 1 pipeline(s). |
@vganesan-nokia There are some build errors can you take a look. Let me know if you are not able to check the link.
|
This is expected. We need the lag id allocator PR #1603 to be merged for this to build successfully |
/AzurePipelines run |
Azure Pipelines successfully started running 1 pipeline(s). |
system_lag_alias = gMyHostName + "|" + gMyAsicName + "|" + lag_alias; | ||
|
||
// Allocate unique lag id | ||
spa_id = m_lagIdAllocator->lagIdAdd(system_lag_alias, 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently Sonic in the teammgrd we don't set a unique oper key in the LACP BPDU, but we are in the process of adding this support (#1660)
We could use this unique id allocation schema there as well. The change there would be to allocate the lagId in teammgrd when user creates a new LAG .. and here in orchagent it should give back the same id as it should be already existing ( created before in teammgrd and updated in SYSTEM LAG TABLE.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The need for this unique lag id allocation is to have unique lag id for all port channels across all the asics. So we let the id allocation be done in centralized location. As you mentioned, it will be interesting use case to use uniquely allocated id as key for LACP. This will work well for multi-asic and distributed chassis. For single asic, we need to set the id boundaries and load the script in local redis server. Yes, the lag id allocator gives the same id for a given portchannel, if it is already allocated,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes in future we would need to extend this to teamd, so that we use this logic with the LUA script itself to generate a unique LACP key filled in the LACP BPDU.
Capturing review approval from @ysmanman before rebasing |
Signed-off-by: vedganes <vedavinayagam.ganesan@nokia.com> Changes for voq system lag implementation (1) Portsorch changes for allocating unique lag id from chassis ap pdb and sending the id in system port aggregator id attribute while creating lag local LAG (2) Portsorch changes to synd local LAG and local LAG members to chassis app db. The sync-ing includes the allocated unique system lag id (3) Portsorch changes to process remote system lag from chassis app db and create lag entry in local asic db with received system lag id (4) Interface orch changes to identify local or remote interfaces (for both port and lag) (5) Orchdaemon changes in orchagent intialization to get hostname and asic_name attributes from DEVICE_METATDATA. These are used for unique system lag name derivation
Signed-off-by: vedganes <vedavinayagam.ganesan@nokia.com> Test case added for testing system lag
Signed-off-by: vedganes <vedavinayagam.ganesan@nokia.com> Changes done to do the voqSyncDelLag() only for the local lag when lag is removed.
Signed-off-by: vedganes <vedavinayagam.ganesan@nokia.com> While adding system port as member for a system port channel, pvid is set. Since pvid setting is not applicable for systemports in remote asics, SAI returns faiure. This results in display of error. setPortPvid() api is modified to skip setting the pvid for system ports and an INFO is displayed.
Signed-off-by: vedganes <vedavinayagam.ganesan@nokia.com> Code review comments fix changes: (1)Enclosed single line if block by {} (2) Added system lag negative tests in vs tests and added code to log error on system lag id release failure to help vs tests. In addition to these review comments fixes, changes are done to remove redundant error logs after addNeighbor() and removeNeighbor() in system neighbor processing
ebdba8c
to
a00aa63
Compare
/AzurePipelines run |
Azure Pipelines successfully started running 1 pipeline(s). |
/AzurePipelines run |
Azure Pipelines successfully started running 1 pipeline(s). |
/AzurePipelines run |
Azure Pipelines successfully started running 1 pipeline(s). |
Hi @judyjoseph, 4 tests of fine grained next group test suite failed. I tried to debug this. But I am unable to simulate in my machine. For me all tests (372 tests) passed. I think there may be some timing issue. Would you please re-start the Azurepipelines again?. Thanks |
/AzurePipelines run |
Azure Pipelines successfully started running 1 pipeline(s). |
/AzurePipelines run |
Azure Pipelines successfully started running 1 pipeline(s). |
/AzurePipelines run |
Azure Pipelines successfully started running 1 pipeline(s). |
* [voq/systemlag] VOQ System lag functionality Signed-off-by: vedganes <vedavinayagam.ganesan@nokia.com> Changes for voq system lag implementation (1) Portsorch changes for allocating unique lag id from chassis ap pdb and sending the id in system port aggregator id attribute while creating lag local LAG (2) Portsorch changes to synd local LAG and local LAG members to chassis app db. The sync-ing includes the allocated unique system lag id (3) Portsorch changes to process remote system lag from chassis app db and create lag entry in local asic db with received system lag id (4) Interface orch changes to identify local or remote interfaces (for both port and lag) (5) Orchdaemon changes in orchagent intialization to get hostname and asic_name attributes from DEVICE_METATDATA. These are used for unique system lag name derivation * [vog/systemlag] VS test for system lag
What I did
System LAG (PortChannel for chassis system) support
Why I did it
In VOQ based chassis systems, to forward across different ASICs through Portchannels configured in different asics, a new attribute called "system port aggregator id" (spa) is defined for the LAG objects. The system lag implementation in this PR allocates unique ID for Portchannel ans the id is sent to SAI to create the LAG with spa. This chassis wide unique spa is sync-ed with chassis app db so that non-owner asics will create corresponding remote LAGs with the same spa.
For more detailed information refer: Distributed VOQ LAG HLD sonic-net/SONiC#697
How I verified it
Details if related
This PR depends on the following PRs: